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We investigate and quantify salient features of the charge distributions on viral capsids. Our 
analysis combines the experimentally determined capsid geometry with simple models for ionization 
of amino acids, thus yielding the detailed description of spatial distribution for positive and negative 
charge across the capsid wall. The obtained data is processed in order to extract the mean radii of 
distributions, surface charge densities and dipole moment densities. The results are evaluated and 
examined in light of previously proposed models of capsid charge distributions, which are shown to 
have to some extent limited value when applied to real viruses. 



I. INTRODUCTION 

Starting from the early structural studies of tobacco 
mosaic virus gels, Bernal and Fankuchen 1 already in- 
voked electrostatic interactions that are "probably due 
to the ionic atmospheres surrounding [viruses]" to ex- 
plain their behavior in ionic solutions. Virus architec- 
ture, cell attachment, penetration, progeny assembly and 
egress should be dependent on long-range colloidal in- 
teractions between and within viruses and various other 
structural components of the cell [2] . Though the impor- 
tance of electrostatic interactions in the context of viruses 
is well recognized (see the review by Siber et al. [3] and 
references therein) and electrostatic models on various 
levels of sophistication abound HHTJ], preciously little 
systematic effort [131 [H] has been directed towards de- 
tailed quantification of the charge distributions on and 
within viral capsids. Models of electrostatic interactions 
in the context of viruses as well as virus-like nanoparticles 
[T51 HB] only make sense if they are derived from detailed 
observed charge distributions on the epitopal and hypo- 
topal surfacesgS] of the capsid, as well as charge buried 
inside the capsomeres. Therefore, to evaluate previous 
modeling attempts, to propose better models, and to find 
out whether there is a prototypical charge distribution of 
a virus capsid, we embark on a detailed study of charge 
distribution on empty viral capsids. 

Our focus will not reside upon the distribution of 
charged amino acids along the ID primary sequences of 
capsomeres [13] but exclusively on the 3D geometry of 
the charge distribution on the capsid. While the details 
of the large-scale nature of the electronic structure of pro- 
teins that would allow the assessment of partial charge 
distribution buried inside the protein core are presently 
unavailable [T7J [18] , the charges of the amino acids re- 
siding on the surface of the capsomers in contact with 
the aqueous solvent at physiological pH are known and 
readily available [19]. We will use the charge distribu- 
tion on the epitopal and hypotopal capsid surfaces of a 
large number of viruses in order to analyze and model its 



statistical signature among the various virus types. 

In order to describe any charge distribution one first 
needs to identify the spatial region in which such a distri- 
bution resides and then quantify its geometry via a set of 
lowest multipolar moments [20]. With this goal in mind 
we will examine a number of available X-ray scattering 
and cryo-electron microscopy structural data on capsids 
of various viruses in order to extract a small set of pa- 
rameters that would characterize simple models of charge 
distribution pertaining to these capsids. This minimal set 
of parameters includes the average size and thickness of 
the capsid, the surface charge density, and surface dipole 
density magnitude of the charge distribution. 

The structure of the paper is as follows: We first ex- 
plain how we construct two simple capsid models from 
the experimental data and obtain the parameters per- 
taining to them. We then briefly analyze the geometrical 
properties of the two models before proceeding to the 
monopolar and dipolar charge distributions on the cap- 
sids. We focus on different surface charge distributions 
pertaining to both models, and the effect of charge on 
the disordered protein N-tails. Lastly, we consider the 
surface dipole density in capsids, and conclude with the 
discussion of our results. 



II. FROM STRUCTURES TO MODEL(S) 

We focus on two simple models most widely used: a 
single, infinitely thin charged shell of radius Rm and sur- 
face charge density a as shown in Fig. [T^, [H [§] , and two 
thin shells of inner and outer radius Ri n and R ou t (giving 
a capsid thickness of 5m = Rout — Rin), carrying surface 
charges of <Ti n and <r out (Fig. [lb) [3J [5] . We will refer to 
the two models as the single-shell and double-shell model, 
respectively. Besides the monopole (total) charge distri- 
bution, we also consider the dipole distribution on such 
model capsids. The analysis is done solely for empty viral 
capsids not encapsidating any genetic material. 

In our analysis we use experimental data deposited 
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Figure 1: Schematic representation of the single-shell and 
double-shell models treated in the paper. Left: single-shell 
model with mean radius Rm and surface charge distribution 
a. Right: double-shell model with the inner shell of radius 
Ri n and outer shell of radius R ou t- The surface charge dis- 
tributions pertaining to the two shells are denoted by tJi„ 
and a out- 



and [5] We analyze approximately 130 viruses from dif- 
ferent families and compare their corresponding model 
charge and mass distribution parameters. 

We classify the different viruses by their genome 
(single-stranded (ss) DNA and ssRNA on one hand and 
double-stranded (ds) DNA and dsRNA on the other) [26] 
and Caspar-Klug triangulation number T [57J [55] . These 
are the most conspicuous properties that classify the an- 
alyzed viruses; there are others, for example the sec- 
ondary/tertiary structure of capsid proteins (i.e. pres- 
ence of a-hcliccs, /3-barrels, ...). However, we expect 
such additional properties play a smaller role in the task 
at hand |29) . and their inclusion would yield no addi- 
tional insight in our analysis. We consider separately the 
bacteriophages (which come with either DNA or RNA 
genome), as well as the T = p'i capsids [44 of RNA viruses 
(which are abundant in our sample), since they might dif- 
fer in their properties |29j . 



in the VIPERdb database [51]. This allows us to con- 
struct three-dimensional structures of viral capsids, from 
which we obtain the various mass and charge distribu- 
tions within the capsid. We consider not only the distri- 
bution of atoms inside a capsid, but the distribution of 
amino acids (their positions taken as centers-of-mass of 
their constituent atoms) and complete protein chains as 
well. 

Some capsid data do not contain the positions of all 
atoms but only the positions of alpha carbons - in such 
cases we equate their positions with the positions of the 
amino acids to which they belong. Due to the methods 
of detection there are also no hydrogen atoms included in 
the experimental data. We have tested the effect of the 
lack of hydrogen atoms on our analysis by adding the 
hydrogen atoms via the MolProbity web server [55] to 
several different capsid entries. As expected, their effect 
on the mass distributions can be neglected, and we did 
so throughout our analysis. 

To obtain the charge distributions of the capsids we 
extract the positions of charged amino acids from the 
experimental data by using Tel scripting language in 
VMD [53]. At physiological pH of 7.4 we consider the fol- 
lowing amino acids as charged [23]: aspartic acid (ASP) 
and glutamic acid (GLU) carrying a charge of —1.0 e , 
lysine (LYS) and arginine (ARG) carrying a charge of 
+1.0 erj, and histidine (HIS) carrying a fractional charge 
of +0.1 eo (where eo is the elementary charge). 

The available experimental data cannot capture the 
usually disordered N-tails of proteins, which in certain 
cases do carry a significant charge [11] . To estimate 
to what extent this affects our analysis we also com- 
pare the capsid protein sequences of viruses deposited 
in VIPERdb with the full sequences obtained from the 
UniProt database of protein sequences [55] . 

In the following sections we extract and analyze the 
parameters of these simple models from the experimen- 
tal data which look like the examples shown in Figs. [2] 



III. SINGLE- AND DOUBLE-SHELL MODELS 

We begin our analysis by constructing single-shell and 
double-shell models from the mass distributions in differ- 
ent viral capsids. The single, infinitely thin shell model 
is characterized by one parameter only, the mean capsid 
radius Rm (Fig. [T^i). The latter is extracted from the 
radial mass distribution in the capsid 



where the angular coordinates have already been pro- 
jected out. This can be done for either the distribution of 
capsid atoms, centers-of-mass of amino acids, or centers- 
of-mass of proteins. The differences between these are 
within a couple of angstroms for most capsids, so we 
concern ourselves mainly with the distribution of capsid 
protein atoms. 

The double-shell model on the other hand is charac- 
terized by two radii, the inner (hypotopal) and the outer 
(epitopal) radius R in and R out (Fig. [Tja). Their differ- 
ence is the capsid thickness 5m — Rout — Rin- These 
parameters are again obtained from the radial density 
distribution, with the thickness defined as the full-width- 
half-maximum (FWHM) of the distribution, and the in- 
ner and outer radius defined as the inner and outer half- 
maximum of the distribution. The bin size of the distri- 
bution influences the result to some extent, but the effect 
is still lower than the usual experimental precision. Also, 
since the exact half-maxima are never achieved due to the 
discreteness of the distribution, the condition they have 
to satisfy is to lie within 5% around the half-maximum. 

To illustrate how this analysis is done we consider the 
example of cucumber mosaic virus (CMV, PDB ID lfl5). 
Figure [2j shows the radial mass distribution in the cap- 
sid, where we can see that the root parts of the protein 
N-tails, prominent in this example, are protruding into 
the capsid interior as defined by the hypotopal radius of 
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To a good approximation, the mean capsid radius of 
the single-shell model increases with the square root of 
the capsid T-number, which means that one can idealize 
the capsid as consisting of uniformly distributed copies of 
a disk-shaped (or prism-shaped) elementary protein with 
a fixed area. A minimal model of this type for equilib- 
rium capsid structure with explicit interaction between 
capsomeres on a spherical shell has received much atten- 
tion recently EU] ■ 

An additional point of interest is also the ratio of the 
capsid thickness and the mean capsid radius 5m/Rm> 
as this can influence the validity of mechanical mod- 
els of viruses, for instance continuum elasticity models 
of thin elastic shells [5^1 [HI]- Analysis of this ratio is 
shown in Fig. [4j For the average virus analyzed this ra- 
tio lies around 0.2, but is (expectedly) no longer small for 
smaller, T = 1 viruses, where the idealization of a thin 
protein shell is misleading. 

These characteristics of the capsid architecture turn 
out to be insensitive to taking the mass distribution in- 
stead of the position distribution, which barely affects 
the calculated mean radius or the thickness of the cap- 
sid. A more detailed analysis of the conserved geometri- 
cal properties of viruses and their elastic properties will 
be published elsewhere (A. Losdorfer Bozic, A. Siber, and 
R. Podgornik, in preparation). 
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Figure 2: Cross-section of (experimentally determined) 
capsid mass distribution in the example of the cucumber 
mosaic virus (ssRNA) capsid (strain FNY), constructed 
from RCSB Protein Databank entry lfl5. The drawing 
was constructed with a procedure described in Ref. [3] with 
W = 1.34 nm and t = 0.85, where all amino acids were as- 
signed strength ("q/eo") 1. Protrusions can be seen on the 
capsid interior which are the roots of protein N-tails; com- 
parison with the full protein sequence shows that they are 
not complete. The inset shows the radial mass distribution 
(Eq. [T]) across the capsid, normalized so that total area of 
the histogram equals f ; marked are the mean capsid radius 
Rm (single-shell model) and the inner and outer radii Ri n 
and R out (double-shell model). 



the distribution i? jn . Any significant outer protrusions 
such as spikes are located in the exterior of the capsid as 
defined by the epitopal radius R ou t ■ These details are not 
included in the simpler single-shell model, characterized 
only by the mean capsid radius Rm- 

In Fig. [3] we next plot the inner and outer radius of 
the double-shell model for the entire dataset of analyzed 
viruses [35], Capsid thickness naturally follows from the 
apparent linearity of their relation and is generally well 
defined. For more than 75% of viruses in our sample the 
thickness is confined to a narow range, 8m ~ 1 .5-4.5 nm. 
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Figure 3: Outer capsid radius compared to inner cap- 
sid radius of the double-shell model. The thickness of the 
capsid emerges naturally from this linear dependence: 
the dashed line shows a thickness of 4.5 nm (i.e. R ou t = 
Rin + 4.5 nm), and the dot-dashed line shows a thickness of 
f .5 nm. Approximately two-thirds of the analyzed capsids 
have a thickness between 2-4 nm. Symbols encode some dif- 
ferent virus types: single-stranded genome (circles), double- 
stranded genome (squares), bacteriophages (diamonds), and 
T = p3 ssRNA viruses (triangles). Same symbols are used 
throughout the paper in other similar figures. 
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Figure 4: Ratio of capsid thickness and capsid mean ra- 
dius 5m I Rm as a function of the triangulation number. The 
capsid thickness becomes more and more comparable to the 
capsid radius as the triangulation number gets lower. 



IV. CHARGE DISTRIBUTIONS 

As specified in Section [TTJ there are five amino acids in 
proteins that carry charge at physiological pH. However, 
there is some uncertainty as to whether these ionizable 
amino acids are charged or not when buried inside a pro- 
tein. Either the dissociation cost for charges buried in 
the protein interior is too high and the buried charges 
are therefore virtually absent [TU] , or the converse is true 
and the majority of ionizable amino acids buried inside 
the protein are ionized |32) . Yet another possibility is 
that the local environment of a buried ionizable amino 
acid is changed, so that its charge is modified 33, 34 . 

With this in mind we consider two limiting cases: in 
the first one, we take all the ionizable amino acids as 
charged, no matter where they are located. In the sec- 
ond case we consider as charged only the ionizable amino 
acids lying on the periphery of the capsids as defined by 
their inner and outer radii. This is admittedly a simpli- 
fied picture, but it enables us to cover the extreme cases. 
Only a complete ab initio quantum chemical calculation 
of the electronic properties of capsid proteins in contact 
with aqueous solvent and neighboring proteins could re- 
solve the issue of the correct charging model for the amino 
acids QUITS]. 

A sample radial charge distribution is again shown for 
the CMV in Fig. [5j All the ionizable amino acids are 
taken as charged, regardless of their position in the cap- 
sid. In this case, we observe that the charges on the 
hypotopal and epitopal surfaces are mostly positive and 
mostly negative, respectively; there are also some charges 
buried in the capsid wall. These are the only distinguish- 
ing features of an otherwise very complicated charge dis- 
tribution. The distribution of charges in the capsid can 
vary significantly from virus to virus, and there appears 
to be no simple way of classifying them. One example of 



a very different distribution is shown in Fig.[6]for the case 
of simian virus 40 (PDB ID lsva). Here, it is difficult to 
separate the charge distribution into a positively charged 
hypotopal surface and a negatively charged epitopal sur- 
face, and there is a good deal of charge variation within 
the capsid wall. 




Figure 5: Cross-section of charge distribution in the ex- 
ample of cucumber mosaic virus (PDB ID lfl5). The 3D 
representation is constructed as described in Ref. [3 with 
W = 1.34 nm and t = 0.85. The histogram plot shows 
corresponding radial charge distribution across the capsid. 
Note that the 3D representation separately represents neg- 
ative (blue) and positive (red) charge densities, while the 
histogram shows the total charge density distribution, calcu- 
lated by weighing both charge distributions. As the negative 
and positive charge distributions overlap, in order to clearly 
show both of them, the positive and negative distributions 
are infinitesimally shifted with respect to each other, so that 
on the right (left) half of the 3D representation the positive 
(negative) distribution is infinitesimally closer to the viewer. 
Marked are the capsid mean mass radius Rm and the inner 
and outer radii Ri n and R ou t of the single- and double-shell 
models, respectively. 



5 




given in terms of a surface charge density a: 



0.15 



0.05 



-0.05 



-0.15 



18 19 20 21 22 23 24 25 26 



r [nm] 

Figure 6: Cross-section of charge distribution in the ex- 
ample of simian virus 40 (PDB ID lsva). The figure is con- 
structed in the same manner as Fig. [5] However, the radial 
charge distribution in this example cannot be easily cat- 
egorized, and, most notably, does not have a pronounced 
positively charged inner part of the capsid and negatively 
charged outer part of the capsid. Marked are again the cap- 
sid mean mass radius Rm and the inner and outer radii 
R in and R out of the single- and double-shell models, respec- 
tively. 



A. Total Charge 

The total charge of the capsid Q is calculated as the 
sum of all charged amino acids in the capsid, Q — J^- 
within the two limiting models described above. We also 
introduce the mean radius of the distribution of absolute 
charge (mean charge radius) Rq, 



Rq 



E» \Qi\n 
\Q\ 



(2) 



where qi are the charges of amino acids located at radii 
rj. The charge mean radius of most of the viruses differs 
from the mass mean radius by up to a few percent. 
The total charge of the single-shell model is usually 
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(3) 



this is the surface charge density of the single-shell model 
with all the ionizable amino acids being charged. Here 
we could equally well use Rq (Eq. [2| instead of Rm but 
we stick with the latter for consistency. 

The surface charge densities of the double-shell model 
Gin and <Jout are similarly defined as 
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The charge on the inner shell is Qi n = J^i Qi i r i(li) < 
Rin, and the charge on the outer shell is calculated in an 
analogous fashion. In order to compare the total charge 
of the two models, we also define 



0jo = 



4ttR m 



(5) 



This can be considered as the surface charge density of 
the single-shell model with only peripheral amino acids 
(i.e. not buried inside the capsid as defined by the double- 
shell model) taken as charged. 

The dependence of the total charges of the single-shell 
model in both limits (a and 070) on t ne capsid T- number 
is shown in Fig. [7j We can see that for the majority of 
viruses the total charge becomes more positive when we 
exclude the buried charges. The values of capsid surface 
charge densities mostly lie within the range from —0.4 
to +0.4 eo/nm 2 . Invoking the previously obtained Rm 
this implies net charge values in the range \Q\ < 4500 
e . Empty viral capsids are obviously quite charged and 
their interactions either between themselves or with other 
structural components of the cell must be to a large ex- 
tent modulated by electrostatics. 

In Fig. [8] we then compare the inner and outer surface 
charge densities of the double-shell model. An emerging 
feature, which can be also discerned from the histogram 
in Fig. |9j is that the outer charges of viruses are close to 
zero or slightly negative; on the contrary, there are quite 
some viruses that carry a significant positive inner charge, 
even though a lot of them still carry an inner charge close 
to zero. The viruses carrying a positive inner charge in 
this case are mostly viruses with single-stranded genome 
(with the exceptions of T = 1 and T — p3 capsids) as 
well as bacteriophages with single-stranded genome. 



B. Effect of Missing (Disordered) N-tails 

The basic (positively charged) N-tails of capsid pro- 
teins are largely unresolved in X-ray scattering experi- 
ments and Belyi and Muthukumar have shown that 
due to their positive charges they can strongly interact 
with the oppositely charged RNA genome. This inter- 
action is also a major factor in constraining the length 
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Figure 7: Distributions of the total capsid surface charge 
density depending on the triangulation number, taking into 
account either all the charged amino acids (a; Eq. [3| or 
only the charged amino acids lying outside the mean capsid 
thickness (070; Eq. [5|. In the latter case the capsids tend 
to carry slightly more positive charge; the relevant range of 
surface charge densities is in both cases well described by 
the interval [—0.4,0.4] e /nm 2 . 



of viral genome, implying a linear relation between the 
number of positive charges on the tails and the length of 
the encapsidated RNA l35 l 136] . 

The effect of missing disordered tails in the experimen- 
tal structure data can be most easily estimated from the 
changes in the total capsid charge brought about by the 
positively charged N-tails. The missing charge is calcu- 
lated from the full primary sequences of capsid proteins. 
Since nothing can be said about their position (other than 
that they are most likely disordered and located on the 
hypotopal side of the capsid), we take all the missing 
charges to be located in the interior of the capsid, that is 
on the inside of Rm or Ri n within the single- and double- 
shell models, respectively. This is an assumption which 
should hold true for most of the analyzed viruses, but 
cannot be easily verified. 

By adding the charge contributed by the N-tails we get 
an estimate of the charge correction AQ and from there 



the new values for the total surface charge density a' in 
the single-shell model and new values for the inner sur- 
face charge density a' in in the double-shell model. From 
the latter we can also obtain the corrected total surface 
charge density of Eq. [5] a' IO ; all the surface charge den- 
sities are again normalized with Rm- 

The distributions of the new surface charge densities 
of the single-shell model as a function of the triangula- 
tion number are shown in Fig. |10| In general, a trend 
toward more positive charge is observed by corrections 
up to |AQ| ~ 6000 en. The same is true also for the 
double-shell model where the total surface charge density 
decomposes into the hypotopal in epitopal contributions 
(Figs. [8] and [9]). The rationalization for this rescalings 
of the capsid charge once one adds explicit charges on 
the disordered N-tails could be envisioned as stemming 
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Figure 8: Top panel: comparison of the surface charge 
densities on the inner and outer shells of capsids (Eq. H. 
The majority of the viruses tend to have at least slightly 
negatively charged outer shell. There is more diversity con- 
cerning the charge on the inner shell, which is in our sample 
centered around zero net charge, with viruses having ei- 
ther negatively or positively charged interior. Bottom panel: 
same as above, with added disordered N-tails of the pro- 
teins. There is a noticeable shift of the inner shell charge (to 
which the missing sequences were attributed) towards more 
positive values. 
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Figure 9: A histogram showing the distribution of inner 
and outer surface charge densities of the double-shell model 
in the sample of viruses used in the analysis. The upper 
part shows the outer surface charge density (blue), and the 
lower part shows the inner surface charge density without 
(red) and with added charge of the N-tails (magenta) . 
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where qi are again the charges of amino acids located at 
radii within the capsid shell. The dipole distribution is 
not invariant with respect to geometric description and 
has to be calculated with respect to some particular refer- 
ence point ro [20] . We choose for the origin the radius of 
the centre of absolute charge Rq. Apart from the abso- 
lute magnitude of the dipolar moment we again consider 
the surface dipole density, normalized with the capsid 
mean radius. The surface dipole density is completely 
analogous to the surface charge density introduced be- 
fore. Since the dipolar moment and its local surface den- 
sity are vectors, we can decompose them into a radial 
and a tangential component - across and along the cap- 
sid wall - and compare their respective magnitudes. 

We calculate the dipolar moment for the basic asym- 
metric unit of a capsid: the conglomeration of a T- 



primarily from the strong N-tail genome electrostatic in- 
teractions Since the genome is negatively charged, 
the hypotopal N-tails effectively act locally to completely 
screen this charge, conferring much needed stability to 
the virus. 

The most pronounced and consistent changes can be 
observed in the case of viruses with single-stranded 
genome, with a clear separation of the total charge be- 
tween T = 3 single-stranded viruses and the rest, and a 
slightly less pronounced separation in the T = 1 viruses 
as well. The charges of the bacteriophages remain mostly 
unchanged after the explicit addition of N-tail charges, 
as do the charges of T = p3 ssRNA viruses. The latter 
case is somewhat surprising, as the majority of single- 
stranded viruses undergo an increase of charge. The ef- 
fect on double-stranded viruses is not so systematic. 

From these results we conclude that the surface charge 
of the capsid is quite large, being comparable to the 
equivalent surface charge of a DNA molecule. In ab- 
solute terms the number of effective charges can go into 
tens of thousands, which is an impressive charge even af- 
ter all the screening and condensation effects are taken 
into account, making viral capsids quintessential charged 
nano-objects [3]. The electrostatic interactions stemming 
from this huge capsid charge are therefore important and 
cannot be neglected. 
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C. Dipole Distribution 

Lastly, we analyze the first higher order multipolar mo- 
ment of the capsid charge distribution, the dipole mo- 
ment. The electric dipole of the capsid shell is defined 



Figure 10: Surface charge density of the capsids with 
added disordered N-tails of the proteins plotted against 
the triangulation number, for both limiting cases consid- 
ered (compare with Fig. [sj. The total charge moves towards 
more positive values in both cases; this trend is least pro- 
nounced in bacteriophages and T = p3 ssRNA viruses. 



number of proteins which, upon applying 60 rotation ma- 
trices of the icosahedral group, compose the entire cap- 
sid. This is done to simplify the analysis and enable us 
to make a good comparison of the results; in principle 
it would be possible to calculate the dipolar moment of 
each capsid protein, but we believe this would not serve 
any additional purpose in our analysis. It would even 
make sense to calculate the dipolar moments of either 
dimers, trimers, pentamers, hexamers, or whatever the 
basic structural units of each capsid is [37], so as to see 
if the dipolar moment plays a role in their interaction. 
However, these units differ from virus to virus, and would 
be difficult to address within our approach. In any case, 
we find that the magnitudes of the dipolar moments in 
capsid proteins are small, and these effects are thus likely 
to be small as well. 

The majority of viruses have small surface dipole den- 
sities, below 0.02 eo/nm. For comparison, one could note 
that the surface dipole density of a completely oriented 
layer of water molecules at close packing would be 0.55 
eo/nm. The obvious conclusion then is that if there is 
any ordered water on the periphery of the capsid, its ef- 
fect will overwhelm the intrinsic dipolar moments of the 
capsid proteins. Note however that the surface water or- 
dering in "hydration layers" would be highly contingent 
on the local protein charge distribution j3S] . One should 
nevertheless remark here that the dipolar moment cal- 
culated above does not take into account the complete 
electronic structure of the proteins with implied partial 
charges within the protein cores that may eventually con- 
tribute to the total dipolar moments of the capsid pro- 
teins. Regardless, compared to monopolar, the dipolar 
surface charge density seems to be much less important. 



V. DISCUSSION & CONCLUSIONS 

We have performed a detailed statistical analysis of 
mass and charge distributions in approximately 130 
empty viral capsids, and extracted the relevant parame- 
ters needed to construct simple single- and double-shell 
models of them. The complete list of analyzed viruses, 
their (triangulation) T-numbers, and genome types, as 
well as a compilation of the results presented in the pa- 
per, is available from the authors upon request. 

The analysis of the charge distribution in capsids was 
based on several assumptions that do not have a uni- 
versal validity, but are at present necessary to take as 
given. In structuring our models we ignored the depen- 
dence of the dissociation constants of amino acids on the 
detailed molecular environment as it would, even though 
possible in a case-by-case analysis [39], make our gen- 
eral approach completely untransparent. Nevertheless, 
these features should be investigated in the context of 
an improved model that would consider fully dissociated 
charge of amino acids on solvent accessible surface of a 
protein as well as the rearrangement of charges inside 
the protein due to quantum electron charge transfer |17j . 



However, the calculation of the latter is at present not 
feasible for such a large number of amino acids, and we 
thus focused only on the dissociated charge. Some viruses 
are also reported stable under in vitro conditions at non- 
physiological pH [30]. Apart from the fact that this at- 
tests to the importance of electrostatic interactions in 
self-assembly of viruses it also has implications for their 
charges. 

Therefore, some approximation for calculating the dis- 
sociation charges of amino acids in a protein has to be 
made, and it can be done in several ways [TTJ [5TJ ST] . We 
chose a straightforward and simple method for extracting 
the charges from 3D experimental data at a single value of 
solution pH that enabled us to perform a consistent and 
general analysis. It is only one possibility though, and 
different approaches can yield quantitatively different re- 
sults especially if the solution pH variation is considered 
in full. 

Within the limitations described above, we were able 
to quantify the radial capsid charge distribution, its cor- 
responding surface charge densities, dipole moments, and 
some of their geometric properties. This is clearly an im- 
portant information to be had when using simple models 
of viral capsids. The monopolar surface charge density 
of the capsids was found to be quite large when com- 
pared with other charged biomolecules, being in the range 
[—0.4, 0.4] eo/nm 2 . We have also shown that for the over- 
all charge of the virus capsids the disordered N-tails con- 
tribute significantly to the net charge, often changing its 
sign. Consequently, this also results in strongly positively 
charged interiors of ssRNA viruses, for which it has been 
suggested that the interior charge is correlated with the 
genome length [TT1 IT21 132] . 

While the dipolar charge contribution turned out to 
be on the other hand overall much smaller, it can never- 
theless play an important role whenever stabilization of 
high energy structures hinges on important subdominant 
contributions. It is in fact this secondary dipolar density 
that most probably governs the short range interactions 
between capsomeres [55] , 

Contrary to some of the capsid geometrical proper- 
ties, the distribution of capsid charges does not seem to 
possess any regularity among viruses with similar trian- 
gulation numbers, genome types, or species, as was also 
observed by Michen and Graule [H] in the study of their 
isoelectric points. The choice of the dataset used in such 
a study can certainly influence the result to some ex- 
tent [12] . and a future increase in the number and va- 
riety of available experimental data would undoubtedly 
improve the analysis. 
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